Optimizing Synchronization Operations for Remote Memory Communication Systems

نویسندگان

Darius Buntinas

Amina Saify

Dhabaleswar K. Panda

Jarek Nieplocha

چکیده

Synchronization operations, such as fence and locking, are used in many parallel operations accessing shared memory. However, a process which is blocked waiting for a fence operation to complete, or for a lock to be acquired, cannot perform useful computation. It is therefore critical that these operations be implemented as efficiently as possible to reduce the time a process waits idle. These operations also impact the scalability of the overall system. As system sizes get larger, the number of processes potentially requesting a lock increases. In this paper we describe the design and implementation of an optimized operation which combines a global fence operation and a barrier synchronization operation. We also describe our implementation of an optimized lock algorithm. The optimizations have been incorporated into the ARMCI communication library. The global fence and barrier operation gives a factor of improvement of up to 9 over the current implementation in a 16 node system, while the optimized lock implementation gives up to 1.25 factor of improvement. These optimizations allow for more efficient and scalable applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-time Systems

This paper introduces a new portable communication library called ARMCI. ARMCI provides one-sided communication capabilities for distributed array libraries and compiler run-time systems. It supports remote memory copy, accumulate, and synchronization operations optimized for non-contiguous data transfers including strided and generalized UNIX I/O vector interfaces. The library has been employe...

متن کامل

Analyses and Optimizations for Shared Address Space Programs

We present compiler analyses and optimizations for explicitly parallel programs that communicate through a shared address space. Any type of code motion on explicitly parallel programs requires a new kind of analysis to ensure that operations reordered on one processor cannot be observed by another. The analysis, called cycle analysis, is based on work by Shasha and Snir and checks for cycles a...

متن کامل

Compiler-Assisted Distributed Shared Memory Schemes Using Memory-Based Communication Facilities

To execute shared-memory-based parallel programs efficiently, we introduce two compiler-assisted software cache schemes which are well-suited to automatic optimizations of remote communications. One scheme is a full user-level software cache (User-level Distributed Shared Memory: UDSM) and another is a page-based cache (Asymmetric Distributed Shared Memory: ADSM) which exploits TLB/MMU only in ...

متن کامل

Optimizing Collective Communication on Multicores

As the gap in performance between the processors and the memory systems continue to grow, the communication component of an application will dictate the overall application performance and scalability. Therefore it is useful to abstract common communication operations across cores as collective communication operations and tune them through a runtime library that can employ sophisticated automa...

متن کامل

Memory-Based Communication Facilities and Asymmetric Distributed Shared Memory

In general-purpose parallel and distributed systems, performance of the protected and virtualized user-level communications and synchronizations is the most crucial issue to realize efficient execution environments. We proposed a novel high-speed user-level communication and synchronization scheme “Memory-Based Communication Facilities (MBCF)” for a general-purpose system with an off-the-shelf ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Optimizing Synchronization Operations for Remote Memory Communication Systems

نویسندگان

چکیده

منابع مشابه

ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-time Systems

Analyses and Optimizations for Shared Address Space Programs

Compiler-Assisted Distributed Shared Memory Schemes Using Memory-Based Communication Facilities

Optimizing Collective Communication on Multicores

Memory-Based Communication Facilities and Asymmetric Distributed Shared Memory

عنوان ژورنال:

اشتراک گذاری